{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "News_Classification.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tOHVDa9DQQR5"
      },
      "source": [
        "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n",
        "\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/classifiers/News_Classification.ipynb)\n",
        "\n",
        "\n",
        "# [DistilBERT Sequence Classification Base - AG News](https://nlp.johnsnowlabs.com/2021/11/21/distilbert_base_sequence_classifier_ag_news_en.html)\n",
        "\n",
        "DistilBERT Model with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.\n",
        "\n",
        "`en.classify.distilbert_sequence.ag_news`  is a fine-tuned DistilBERT model that is ready to be used for Sequence Classification tasks such as sentiment analysis or multi-class text classification and it achieves state-of-the-art performance.\n",
        "\n",
        "\n",
        "We used TFDistilBertForSequenceClassification to train this model and used BertForSequenceClassification annotator in Spark NLP 🚀 for prediction at scale!\n",
        "\n",
        "\n",
        "It can be used to classify news into multiple categories that can be used by  news stations to provide  a better experience to their users. \n",
        "<br>\n",
        "\n",
        "This model can be used to predict the following news categories \n",
        "`Business`, `Sci/Tech`, `Sports`, `World`\n",
        "\n",
        "<br>\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "The data Source used to train this can be found [here](https://huggingface.co/datasets/ag_news)\n",
        "\n",
        "<br>\n",
        "\n",
        "##Benchmark on Dataset \n",
        "![image.png]()"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##1.Setup Java 8 and NLU"
      ],
      "metadata": {
        "id": "HIxATMI7ixJx"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SF5-Z-U4jukd",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "1fc3b7bc-858c-4107-aa36-9045bf13bb72"
      },
      "source": [
        "!wget https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh -O - | bash\n",
        "\n",
        "import nlu"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2022-05-19 23:01:01--  https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 1665 (1.6K) [text/plain]\n",
            "Saving to: ‘STDOUT’\n",
            "\n",
            "\r-                     0%[                    ]       0  --.-KB/s               Installing  NLU 3.4.4rc1 with  PySpark 3.0.3 and Spark NLP 3.4.3 for Google Colab ...\n",
            "\r-                   100%[===================>]   1.63K  --.-KB/s    in 0.001s  \n",
            "\n",
            "2022-05-19 23:01:01 (1.66 MB/s) - written to stdout [1665/1665]\n",
            "\n",
            "Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease\n",
            "Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease\n",
            "Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]\n",
            "Get:4 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]\n",
            "Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease\n",
            "Hit:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release\n",
            "Get:7 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]\n",
            "Get:8 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]\n",
            "Hit:9 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease\n",
            "Hit:10 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease\n",
            "Hit:11 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease\n",
            "Get:12 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease [21.3 kB]\n",
            "Get:13 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3,199 kB]\n",
            "Get:15 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2,277 kB]\n",
            "Get:16 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB]\n",
            "Get:17 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [966 kB]\n",
            "Get:18 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [932 kB]\n",
            "Get:19 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1,503 kB]\n",
            "Get:20 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic/main amd64 Packages [44.3 kB]\n",
            "Get:21 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2,765 kB]\n",
            "Get:22 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB]\n",
            "Fetched 12.0 MB in 3s (3,506 kB/s)\n",
            "Reading package lists... Done\n",
            "tar: spark-3.0.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory\n",
            "tar: Error is not recoverable: exiting now\n",
            "\u001b[K     |████████████████████████████████| 209.1 MB 65 kB/s \n",
            "\u001b[K     |████████████████████████████████| 144 kB 22.7 MB/s \n",
            "\u001b[K     |████████████████████████████████| 517 kB 63.7 MB/s \n",
            "\u001b[K     |████████████████████████████████| 198 kB 70.3 MB/s \n",
            "\u001b[?25h  Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##2.Load the mdoel and make Sample Predictions "
      ],
      "metadata": {
        "id": "XiVdjjfzij2R"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "pipeline = nlu.load('en.classify.distilbert_sequence.ag_news')\n",
        "pipeline.predict(\"The Stock Market just crashed as AAPL dropped by 2 %.\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 190
        },
        "id": "zzWPoMYZsVj2",
        "outputId": "7746832d-4bbe-4fe3-bebd-5f83d450ee5b"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "distilbert_base_sequence_classifier_ag_news download started this may take some time.\n",
            "Approximate size to download 234.9 MB\n",
            "[OK!]\n",
            "sentence_detector_dl download started this may take some time.\n",
            "Approximate size to download 354.6 KB\n",
            "[OK!]\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "  classified_sequence classified_sequence_confidence  \\\n",
              "0            Business                         0.9741   \n",
              "\n",
              "                                            sentence  \n",
              "0  The Stock Market just crashed as AAPL dropped ...  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-73fec4dd-04c9-4084-81c9-7c0e7f825019\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>classified_sequence</th>\n",
              "      <th>classified_sequence_confidence</th>\n",
              "      <th>sentence</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Business</td>\n",
              "      <td>0.9741</td>\n",
              "      <td>The Stock Market just crashed as AAPL dropped ...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-73fec4dd-04c9-4084-81c9-7c0e7f825019')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-73fec4dd-04c9-4084-81c9-7c0e7f825019 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-73fec4dd-04c9-4084-81c9-7c0e7f825019');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 24
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##3.Define Sample Sentences"
      ],
      "metadata": {
        "id": "_XcMyXVXsdYW"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "sample_sentences = [\n",
        "\"Global Warming is a major concern and action needs to be taken swiftly.\",\n",
        "\"Disney Comics was a comic book publishing company operated by The Walt Disney Company which ran from 1990 to 1993.\",\n",
        "\"Fans get ready as the next fifa game gets close to realsing.\",\n",
        "\"Nasa makes great progress on it's mission of landing on mars\"\n",
        "]"
      ],
      "metadata": {
        "id": "c_ty0gP6sglE"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "##4.Predict on Sample Sentences"
      ],
      "metadata": {
        "id": "qpD0_HqksiLw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "pipeline.predict(sample_sentences)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 175
        },
        "id": "zynQvfNmWI50",
        "outputId": "ffafaa3f-45a2-49bc-a247-394d27370e97"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "  classified_sequence classified_sequence_confidence  \\\n",
              "0               World                       0.567128   \n",
              "1            Business                       0.899087   \n",
              "2              Sports                       0.452134   \n",
              "3            Sci/Tech                       0.521322   \n",
              "\n",
              "                                            sentence  \n",
              "0  Global Warming is a major concern and action n...  \n",
              "1  Disney Comics was a comic book publishing comp...  \n",
              "2  Fans get ready as the next fifa game gets clos...  \n",
              "3  Nasa makes great progress on it's mission of l...  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-99ba8319-1c57-42ad-bcab-0c82b14319ca\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>classified_sequence</th>\n",
              "      <th>classified_sequence_confidence</th>\n",
              "      <th>sentence</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>World</td>\n",
              "      <td>0.567128</td>\n",
              "      <td>Global Warming is a major concern and action n...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Business</td>\n",
              "      <td>0.899087</td>\n",
              "      <td>Disney Comics was a comic book publishing comp...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Sports</td>\n",
              "      <td>0.452134</td>\n",
              "      <td>Fans get ready as the next fifa game gets clos...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Sci/Tech</td>\n",
              "      <td>0.521322</td>\n",
              "      <td>Nasa makes great progress on it's mission of l...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-99ba8319-1c57-42ad-bcab-0c82b14319ca')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-99ba8319-1c57-42ad-bcab-0c82b14319ca button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-99ba8319-1c57-42ad-bcab-0c82b14319ca');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##5.Take a look at the parmaters of the pipeline"
      ],
      "metadata": {
        "id": "cEf6CWsxtDWJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "pipeline.print_info()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3TgLjxNltPCp",
        "outputId": "2e827526-a84d-41e0-e31c-5eb0e4c95214"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n",
            ">>> component_list['distil_bert_for_sequence_classification'] has settable params:\n",
            "component_list['distil_bert_for_sequence_classification'].setActivation('softmax')                         | Info: Whether to calcuate logits via Softmax or Sigmoid. Default is Softmax | Currently set to : softmax\n",
            "component_list['distil_bert_for_sequence_classification'].setCoalesceSentences(False)                      | Info: Instead of 1 class per sentence (if inputCols is '''sentence''') output 1 class per document by averaging probabilities in all sentences. | Currently set to : False\n",
            "component_list['distil_bert_for_sequence_classification'].setBatchSize(32)                                 | Info: Size of every batch | Currently set to : 32\n",
            "component_list['distil_bert_for_sequence_classification'].setMaxSentenceLength(512)                        | Info: Max sentence length to process | Currently set to : 512\n",
            "component_list['distil_bert_for_sequence_classification'].setCaseSensitive(True)                           | Info: whether to ignore case in tokens for embeddings matching | Currently set to : True\n",
            ">>> component_list['document_assembler'] has settable params:\n",
            "component_list['document_assembler'].setCleanupMode('shrink')                                              | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n",
            ">>> component_list['tokenizer'] has settable params:\n",
            "component_list['tokenizer'].setTargetPattern('\\S+')                                                        | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n",
            "component_list['tokenizer'].setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"])  | Info: character list used to separate from token boundaries | Currently set to : ['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"]\n",
            "component_list['tokenizer'].setCaseSensitiveExceptions(True)                                               | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n",
            "component_list['tokenizer'].setMinLength(0)                                                                | Info: Set the minimum allowed legth for each token | Currently set to : 0\n",
            "component_list['tokenizer'].setMaxLength(99999)                                                            | Info: Set the maximum allowed legth for each token | Currently set to : 99999\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Looking Good! Let's test this model on a labelled dataset to see how it performs "
      ],
      "metadata": {
        "id": "1hF8dmLIkbSz"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "##6.Download Data\n",
        "\n",
        "we are going to test the model on [this](https://www.kaggle.com/datasets/rmisra/news-category-dataset?resource=download) dataset \n",
        "\n",
        "This dataset contains around 200k news headlines from the year 2012 to 2018 obtained from HuffPost. The model trained on this dataset could be used to identify tags for untracked news articles or to identify the type of language used in different news articles.\n",
        "\n",
        "We filterd the dataset to include only the labels the model supports .\n"
      ],
      "metadata": {
        "id": "uRbusvKOkVac"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!wget http://ckl-it.de/wp-content/uploads/2022/04/Data.csv"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KvD0rZTBrnrL",
        "outputId": "ea7a9227-3d17-4328-d048-ddf0a2e653f3"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2022-05-19 22:55:49--  http://ckl-it.de/wp-content/uploads/2022/04/Data.csv\n",
            "Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n",
            "Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 5551258 (5.3M) [text/csv]\n",
            "Saving to: ‘Data.csv’\n",
            "\n",
            "Data.csv            100%[===================>]   5.29M  2.40MB/s    in 2.2s    \n",
            "\n",
            "2022-05-19 22:55:52 (2.40 MB/s) - ‘Data.csv’ saved [5551258/5551258]\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas  as pd \n",
        "df = pd.read_csv(\"Data.csv\")\n",
        "df"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "n-6bOT4krpr8",
        "outputId": "24e2cb3b-13a5-4cae-ed01-ec562afff02e"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "       Unnamed: 0  category  \\\n",
              "0              11     World   \n",
              "1              23     World   \n",
              "2              24     World   \n",
              "3              25     World   \n",
              "4              26     World   \n",
              "...           ...       ...   \n",
              "17253      200848  Sci/Tech   \n",
              "17254      200849    Sports   \n",
              "17255      200850    Sports   \n",
              "17256      200851    Sports   \n",
              "17257      200852    Sports   \n",
              "\n",
              "                                                headline  \\\n",
              "0      South Korean President Meets North Korea's Kim...   \n",
              "1      North Korea Still Open To Talks After Trump Ca...   \n",
              "2      2 Men Detonate Bomb Inside Indian Restaurant N...   \n",
              "3      Thousands Travel Home To Ireland To Vote On Ab...   \n",
              "4      Irish Voters Set To Liberalize Abortion Laws I...   \n",
              "...                                                  ...   \n",
              "17253  RIM CEO Thorsten Heins' 'Significant' Plans Fo...   \n",
              "17254  Maria Sharapova Stunned By Victoria Azarenka I...   \n",
              "17255  Giants Over Patriots, Jets Over Colts Among  M...   \n",
              "17256  Aldon Smith Arrested: 49ers Linebacker Busted ...   \n",
              "17257  Dwight Howard Rips Teammates After Magic Loss ...   \n",
              "\n",
              "                                     authors  \\\n",
              "0                                        NaN   \n",
              "1      Josh Smith and Christine Kim, Reuters   \n",
              "2                                        NaN   \n",
              "3                           Antonia Blumberg   \n",
              "4                                        NaN   \n",
              "...                                      ...   \n",
              "17253                       Reuters, Reuters   \n",
              "17254                                    NaN   \n",
              "17255                                    NaN   \n",
              "17256                                    NaN   \n",
              "17257                                    NaN   \n",
              "\n",
              "                                                    link  \\\n",
              "0      https://www.huffingtonpost.com/entry/south-kor...   \n",
              "1      https://www.huffingtonpost.com/entry/north-kor...   \n",
              "2      https://www.huffingtonpost.com/entry/mississau...   \n",
              "3      https://www.huffingtonpost.com/entry/irish-tra...   \n",
              "4      https://www.huffingtonpost.com/entry/ireland-a...   \n",
              "...                                                  ...   \n",
              "17253  https://www.huffingtonpost.com/entry/rim-ceo-t...   \n",
              "17254  https://www.huffingtonpost.com/entry/maria-sha...   \n",
              "17255  https://www.huffingtonpost.com/entry/super-bow...   \n",
              "17256  https://www.huffingtonpost.com/entry/aldon-smi...   \n",
              "17257  https://www.huffingtonpost.com/entry/dwight-ho...   \n",
              "\n",
              "                                       short_description        date  \n",
              "0      The two met to pave the way for a summit betwe...  2018-05-26  \n",
              "1      Trump’s announcement came after repeated threa...  2018-05-25  \n",
              "2      Fifteen people were taken to the hospital, thr...  2018-05-25  \n",
              "3      Just try to read these #HomeToVote tweets with...  2018-05-25  \n",
              "4                     Vote counting will begin Saturday.  2018-05-25  \n",
              "...                                                  ...         ...  \n",
              "17253  Verizon Wireless and AT&T are already promotin...  2012-01-28  \n",
              "17254  Afterward, Azarenka, more effusive with the pr...  2012-01-28  \n",
              "17255  Leading up to Super Bowl XLVI, the most talked...  2012-01-28  \n",
              "17256  CORRECTION: An earlier version of this story i...  2012-01-28  \n",
              "17257  The five-time all-star center tore into his te...  2012-01-28  \n",
              "\n",
              "[17258 rows x 7 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-d285dd7b-8206-4ab7-8e9a-c70c483752fd\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>category</th>\n",
              "      <th>headline</th>\n",
              "      <th>authors</th>\n",
              "      <th>link</th>\n",
              "      <th>short_description</th>\n",
              "      <th>date</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>11</td>\n",
              "      <td>World</td>\n",
              "      <td>South Korean President Meets North Korea's Kim...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/south-kor...</td>\n",
              "      <td>The two met to pave the way for a summit betwe...</td>\n",
              "      <td>2018-05-26</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>23</td>\n",
              "      <td>World</td>\n",
              "      <td>North Korea Still Open To Talks After Trump Ca...</td>\n",
              "      <td>Josh Smith and Christine Kim, Reuters</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/north-kor...</td>\n",
              "      <td>Trump’s announcement came after repeated threa...</td>\n",
              "      <td>2018-05-25</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>24</td>\n",
              "      <td>World</td>\n",
              "      <td>2 Men Detonate Bomb Inside Indian Restaurant N...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/mississau...</td>\n",
              "      <td>Fifteen people were taken to the hospital, thr...</td>\n",
              "      <td>2018-05-25</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>25</td>\n",
              "      <td>World</td>\n",
              "      <td>Thousands Travel Home To Ireland To Vote On Ab...</td>\n",
              "      <td>Antonia Blumberg</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/irish-tra...</td>\n",
              "      <td>Just try to read these #HomeToVote tweets with...</td>\n",
              "      <td>2018-05-25</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>26</td>\n",
              "      <td>World</td>\n",
              "      <td>Irish Voters Set To Liberalize Abortion Laws I...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/ireland-a...</td>\n",
              "      <td>Vote counting will begin Saturday.</td>\n",
              "      <td>2018-05-25</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17253</th>\n",
              "      <td>200848</td>\n",
              "      <td>Sci/Tech</td>\n",
              "      <td>RIM CEO Thorsten Heins' 'Significant' Plans Fo...</td>\n",
              "      <td>Reuters, Reuters</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/rim-ceo-t...</td>\n",
              "      <td>Verizon Wireless and AT&amp;T are already promotin...</td>\n",
              "      <td>2012-01-28</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17254</th>\n",
              "      <td>200849</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Maria Sharapova Stunned By Victoria Azarenka I...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/maria-sha...</td>\n",
              "      <td>Afterward, Azarenka, more effusive with the pr...</td>\n",
              "      <td>2012-01-28</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17255</th>\n",
              "      <td>200850</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Giants Over Patriots, Jets Over Colts Among  M...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/super-bow...</td>\n",
              "      <td>Leading up to Super Bowl XLVI, the most talked...</td>\n",
              "      <td>2012-01-28</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17256</th>\n",
              "      <td>200851</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Aldon Smith Arrested: 49ers Linebacker Busted ...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/aldon-smi...</td>\n",
              "      <td>CORRECTION: An earlier version of this story i...</td>\n",
              "      <td>2012-01-28</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17257</th>\n",
              "      <td>200852</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Dwight Howard Rips Teammates After Magic Loss ...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/dwight-ho...</td>\n",
              "      <td>The five-time all-star center tore into his te...</td>\n",
              "      <td>2012-01-28</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>17258 rows × 7 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d285dd7b-8206-4ab7-8e9a-c70c483752fd')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-d285dd7b-8206-4ab7-8e9a-c70c483752fd button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-d285dd7b-8206-4ab7-8e9a-c70c483752fd');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 2
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#The model makes prediction on the text column by default  \n",
        "df['text'] = df['headline']+df['short_description']"
      ],
      "metadata": {
        "id": "bVAaP78d5QQJ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's take  a Peek at the distribution of the labels "
      ],
      "metadata": {
        "id": "mUl3_RmhuSHW"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df.category.value_counts().plot.barh(title='Distribution of Labels')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 298
        },
        "id": "7b9cKEyMtbWL",
        "outputId": "6350bfa4-13ea-443e-c9fe-94412b8f2ba1"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<matplotlib.axes._subplots.AxesSubplot at 0x7f2b4b421e10>"
            ]
          },
          "metadata": {},
          "execution_count": 3
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<Figure size 432x288 with 1 Axes>"
            ],
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEICAYAAABxiqLiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAWp0lEQVR4nO3de5RlZX3m8e9jcxPBBm2Wti3SoIyKogitBoKCl6gIUcdFRowXUDMMJouJt0QY1KCJE4zGGFEHERHjFcUYFaLgDW8o2CDQoIAIjTSgCGgLCATa3/xx3oJDWdXd9XZVna7u72ets2rvd+/97t9bHM5z3r1Pn0pVIUnSVN1n1AVIkuYmA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFANGckuS4JG+epr4eluSWJPPa+plJ/mI6+m79fTnJwdPV3xTO+w9Jbkjyi2nsc3GSSrLJbB6r9ZsBovVGkuVJbktyc5LfJDkryWFJ7n6eVtVhVfX3a9nXM1e3T1X9vKq2qqpV01D70Uk+Pq7//arqo+va9xTreBjwemCXqnrwBNv3TbJiNmvShssA0frmT6tqa2AH4BjgjcCHp/skG/C74YcBN1bV9aMuRBs+A0TrpapaWVVfBF4EHJzksQBJTkryD215QZJT22zlpiTfSXKfJB9j8EL6pXaJ6m+HLqO8KsnPgW9Mcmnl4UnOSfLbJF9I8oB2rj945z42y0nyHOD/AC9q57ugbb/7klir601JrkpyfZJ/SzK/bRur4+AkP2+Xn46a7HeTZH47/letvze1/p8JfBV4SKvjpKn8zpPsn+RHbexXJzl6gt1emeTaJNclecPQsfdJckSSnyW5Mclnxn53E5znkCRXtJnmlUleMpU6tf4wQLReq6pzgBXAUybY/Pq2bTvgQQxexKuqXgb8nMFsZquq+qehY/YBHg08e5JTvhx4JbAQuAt471rU+BXg/wInt/M9foLdDmmPpwE7AVsB7xu3z97AI4FnAG9J8uhJTnksML/1s0+r+RVV9TVgP+DaVscha6p9nFtbX9sA+wOvTvKCcfs8DdgZeBbwxqHLhIcDL2j1PAT4NfD+8SdIcj8Gv9P92kxzL+D8Kdap9YQBorngWmCid7N3Mnih36Gq7qyq79Sav9zt6Kq6tapum2T7x6rqoqq6FXgz8D/GbrKvo5cA766qK6rqFuBI4KBxs5+3VtVtVXUBcAHwB0HUajkIOLKqbq6q5cA/Ay9b1wKr6syqWlZVv6+qC4FPMQiEYW9tv79lwEeAF7f2w4CjqmpFVd0BHA0cOMmlwt8Dj01y36q6rqouXtfaNRoGiOaCRcBNE7S/E7gcOKNdEjliLfq6egrbrwI2BRasVZWr95DW33DfmzCYOY0Z/tTU7xjMUsZb0Goa39eidS0wyZOTfLNdGlvJIBTGj3387+chbXkH4PPtcuJvgJ8Aq7j3+GjB/KLW93VJTkvyqHWtXaNhgGi9luSJDF4cvzt+W3sH/vqq2gl4HvC6JM8Y2zxJl2uaoWw/tPwwBrOcGxhc3tlyqK55DC6drW2/1zJ4kR3u+y7gl2s4brwbWk3j+7pmiv1M5JPAF4Htq2o+cByQcfuM//1c25avZnBZapuhxxZV9Qd1VdXpVfUnDGaPlwAfmobaNQIGiNZLSe6f5ADg08DH2yWT8fsckOQRSQKsZPCO9/dt8y8Z3COYqpcm2SXJlsDbgFPax3wvA7ZoN5o3Bd4EbD503C+BxcMfOR7nU8Brk+yYZCvuuWdy11SKa7V8Bnh7kq2T7AC8Dvj46o+8tyRbjHsE2Bq4qapuT/Ik4M8nOPTNSbZM8hjgFcDJrf24VtMOrf/tkjx/gvM+KMnz272QO4BbuOe/meYYA0Trmy8luZnBO9qjgHczeKGayM7A1xi8CH0f+EBVfbNt+0fgTe2SyhsmOX4iHwNOYnA5aQvgf8PgU2HAXwInMHi3fyuDG/hjPtt+3pjkvAn6PbH1/W3gSuB2Bjeeexzezn8Fg5nZJ1v/a2sRcNu4x8MZjO9t7ff/FgZBNd63GFw2/Drwrqo6o7X/K4PZyxnt+B8AT57g+PswCLxrGVyW3Ad49RRq13ok/kEpSVIPZyCSpC4GiCSpiwEiSepigEiSumyoXyg3oQULFtTixYtHXYYkzRnnnnvuDVW13UTbNqoAWbx4MUuXLh11GZI0ZyS5arJtXsKSJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUZaP6MsVl16xk8RGnjboMTZPlx+w/6hKkjZozEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVKXGQ2QJP+S5DVD66cnOWFo/Z+TvG4t+zopyYETtO+b5NTpqViStLZmegbyPWAvgCT3ARYAjxnavhdw1po6STJvRqqTJHWb6QA5C9izLT8GuAi4Ocm2STYHHg3MT/KjJMuSnNjaSbI8yTuSnAf82XCnSZ6T5JK27YUzPAZJ0gRmNECq6lrgriQPYzDb+D5wNoNQWQL8FDgBeFFV7crgyx1fPdTFjVW1e1V9eqwhyRbAh4A/BfYAHjyTY5AkTWw2bqKfxSA8xgLk+0PrK4Arq+qytu9HgacOHXvyBP09qh3z06oq4OOrO3mSQ5MsTbJ01e9WrttIJEl3m40AGbsPsiuDS1g/YDAD2Qs4cw3H3rquJ6+q46tqSVUtmbfl/HXtTpLUzNYM5ADgpqpaVVU3AdswCJHPAYuTPKLt+zLgW2vo75J2zMPb+otnoGZJ0hrMRoAsY/Dpqx+Ma1tZVSuAVwCfTbIM+D1w3Oo6q6rbgUOB09pN9OtnpGpJ0mrN+F8krKpVwP3HtR0ytPx14AkTHLd4Ncd8hcG9EEnSiPgv0SVJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFAJEkdZnxf0i4Ptl10XyWHrP/qMuQpA2CMxBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUZZNRFzCbll2zksVHnDbqMiQBy4/Zf9QlaB05A5EkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV26AiTJUUkuTnJhkvOTPHmS/ZYkee/Q+qZJrmzHnJ/kF0muGVrfbC3Pv2+SU3tqlyRNjyl/F1aSPYEDgN2r6o4kC4AJX/iraimwdKhpb+DUqjq89XU0cEtVvWuqdUiSRqtnBrIQuKGq7gCoqhuq6tokT0xyVpILkpyTZOsJZgrPAb48UadJ9kjyrSTnJjk9ycLW/ogkX2v9npfk4e2QrZKckuSSJJ9Iko6xSJI69QTIGcD2SS5L8oEk+7RLTycDf11VjweeCdw2wbFPA84c35hkU+BY4MCq2gM4EXh72/wJ4P2t372A61r7E4DXALsAOwF/PFGxSQ5NsjTJ0lW/W9kxXEnSRKZ8CauqbkmyB/AUBoFwMoMX++uq6odtn98CDE8KkiwCbqqq303Q7SOBxwJfbcfMA65LsjWwqKo+3/q9fajfc6pqRVs/H1gMfHeCeo8HjgfYfOHONdXxSpIm1vX3QKpqFYOZxJlJlgF/tRaHPQc4fZJtAS6uqj3v1TgIkMncMbS8io3sb5tI0qhN+RJWkkcm2XmoaTfgJ8DCJE9s+2ydZPwL+qT3P4BLge3aDfqxT2s9pqpuBlYkeUFr3zzJllOtWZI0/XretW8FHJtkG+Au4HLgUOAjrf2+DO5/PHPsgCTzgEdU1SUTdVhV/5XkQOC9Sea3ut4DXAy8DPhgkrcBdwJ/1lGzJGmapWrmbwsk2Rt4aVUdNuMnW43NF+5cCw9+zyhLkNT4J23nhiTnVtWSibbNyn2DqvouE9zgliTNXX6ViSSpiwEiSepigEiSuhggkqQuBogkqYsBIknqslF9/ceui+az1M+eS9K0cAYiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQum4y6gNm07JqVLD7itFGXIWk9svyY/UddwpzlDESS1MUAkSR1MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUZUYCJMlRSS5OcmGS85M8eRr63DfJXtNRnyRp3U37V5kk2RM4ANi9qu5IsgDYbB373ATYF7gFOGudi5QkrbOZ+C6shcANVXUHQFXdAJBkOfAZYD/gNuDPq+ryJIuBE4EFwK+AV1TVz5OcBNwOPAG4BtgLWJXkpcDhwIOBvwNWASur6qkzMBZJ0iRm4hLWGcD2SS5L8oEk+wxtW1lVuwLvA97T2o4FPlpVjwM+Abx3aP+HAntV1QuB44B/qardquo7wFuAZ1fV44HnTVZMkkOTLE2ydNXvVk7bICVpYzftAVJVtwB7AIcymFGcnOSQtvlTQz/3bMt7Ap9syx8D9h7q7rNVtWqSU30POCnJ/wTmraae46tqSVUtmbfl/KkOR5I0iRn5Ovf2on8mcGaSZcDBY5uGd1uLrm5dzTkOazfn9wfOTbJHVd3YWbIkaYqmfQaS5JFJdh5q2g24qi2/aOjn99vyWcBBbfklwHcm6fpmYOuh8zy8qs6uqrcwmOlsPw3lS5LW0kzMQLYCjk2yDXAXcDmDy1kHANsmuRC4A3hx2/9w4CNJ/oZ2E32Sfr8EnJLk+e2Y17agCvB14IIZGIskaRLTHiBVdS6DT0zdSxKAd1bVG8ftfxXw9An6OWTc+mXA44aaJpupSJJmgf8SXZLUZdb+JnpVLZ6tc0mSZp4zEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUZdY+xrs+2HXRfJYes/+oy5CkDYIzEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUxQCRJHUxQCRJXQwQSVIXA0SS1MUAkSR1MUAkSV0MEElSFwNEktRlk1EXMJuWXbOSxUecNuoyJGnWLD9m/xnr2xmIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqcsaAyTJqiTnJ7kgyXlJ9uo5UZLDkry851hJ0vpnbb7K5Laq2g0gybOBfwT2meqJquq4qR4jSVp/TfUS1v2BXwMk2TfJqWMbkrwvySFt+ZgkP05yYZJ3tbajk7yhLZ+Z5B1JzklyWZKntPZ5Sd6Z5Ift2P/V2hcm+XabCV2U5Clt35Pa+rIkr13n34Ykaa2tzQzkvknOB7YAFgJPX93OSR4I/HfgUVVVSbaZ7NxV9aQkzwX+Dngm8CpgZVU9McnmwPeSnAG8EDi9qt6eZB6wJbAbsKiqHtvOO9l5JEkzYKqXsPYE/i3JY1ez/0rgduDDbYZy6iT7/Xv7eS6wuC0/C3hckgPb+nxgZ+CHwIlJNgX+o6rOT3IFsFOSY4HTgDMmOkmSQ4FDAebdf7s1jVWStJamdAmrqr4PLAC2A+4ad/wWbZ+7gCcBpwAHAF+ZpLs72s9V3BNkAQ6vqt3aY8eqOqOqvg08FbgGOCnJy6vq18DjgTOBw4ATJqn5+KpaUlVL5m05fyrDlSStxpT+HkiSRwHzgBuBq4Bd2qWm+wLPAL6bZCtgy6r6zyTfA66YwilOB16d5BtVdWeS/8YgNBYAK6rqQ+18uyf5T+C/qupzSS4FPj6VsUiS1s1U7oHAYIZwcFWtAq5O8hngIuBK4Edtn62BLyTZou3/uinUcwKDy1nnJQnwK+AFwL7A3yS5E7gFeDmwCPhIkrFZ0JFTOI8kaR2lqkZdw6zZfOHOtfDg94y6DEmaNev6FwmTnFtVSyba5r9ElyR1MUAkSV0MEElSFwNEktTFAJEkdTFAJEldDBBJUhcDRJLUZUpfZTLX7bpoPkvX8R/VSJIGnIFIkroYIJKkLgaIJKmLASJJ6mKASJK6GCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqYsBIknqYoBIkrqkqkZdw6xJcjNw6ajrmGYLgBtGXcQMcFxzx4Y4JnBcY3aoqu0m2rBRfZ07cGlVLRl1EdMpydINbUzguOaSDXFM4LjWhpewJEldDBBJUpeNLUCOH3UBM2BDHBM4rrlkQxwTOK412qhuokuSps/GNgORJE0TA0SS1GWjCJAkz0lyaZLLkxwx6nrWJMmJSa5PctFQ2wOSfDXJT9vPbVt7kry3je3CJLsPHXNw2/+nSQ4exViGatk+yTeT/DjJxUn+urXP9XFtkeScJBe0cb21te+Y5OxW/8lJNmvtm7f1y9v2xUN9HdnaL03y7NGM6B5J5iX5UZJT2/qGMKblSZYlOT/J0tY2p5+DrZ5tkpyS5JIkP0my56yMq6o26AcwD/gZsBOwGXABsMuo61pDzU8FdgcuGmr7J+CItnwE8I62/Fzgy0CAPwLObu0PAK5oP7dty9uOcEwLgd3b8tbAZcAuG8C4AmzVljcFzm71fgY4qLUfB7y6Lf8lcFxbPgg4uS3v0p6bmwM7tufsvBE/D18HfBI4ta1vCGNaDiwY1zann4Otpo8Cf9GWNwO2mY1xjWzAs/iL3RM4fWj9SODIUde1FnUv5t4BcimwsC0vZPCPIgE+CLx4/H7Ai4EPDrXfa79RP4AvAH+yIY0L2BI4D3gyg3/pu8n45yBwOrBnW96k7Zfxz8vh/UY0locCXweeDpzaapzTY2o1LOcPA2ROPweB+cCVtA9Fzea4NoZLWIuAq4fWV7S2ueZBVXVdW/4F8KC2PNn41ttxt0scT2Dwbn3Oj6td6jkfuB74KoN32r+pqrvaLsM13l1/274SeCDr37jeA/wt8Pu2/kDm/pgACjgjyblJDm1tc/05uCPwK+Aj7ZLjCUnuxyyMa2MIkA1ODd4ezMnPXyfZCvgc8Jqq+u3wtrk6rqpaVVW7MXjX/iTgUSMuaZ0kOQC4vqrOHXUtM2Dvqtod2A/4qyRPHd44R5+DmzC45P3/quoJwK0MLlndbabGtTEEyDXA9kPrD21tc80vkywEaD+vb+2TjW+9G3eSTRmExyeq6t9b85wf15iq+g3wTQaXd7ZJMvZdc8M13l1/2z4fuJH1a1x/DDwvyXLg0wwuY/0rc3tMAFTVNe3n9cDnGQT+XH8OrgBWVNXZbf0UBoEy4+PaGALkh8DO7RMkmzG4yffFEdfU44vA2KciDmZwD2Gs/eXtkxV/BKxs09bTgWcl2bZ9+uJZrW0kkgT4MPCTqnr30Ka5Pq7tkmzTlu/L4L7OTxgEyYFtt/HjGhvvgcA32rvDLwIHtU807QjsDJwzO6O4t6o6sqoeWlWLGfz/8o2qeglzeEwASe6XZOuxZQbPnYuY48/BqvoFcHWSR7amZwA/ZjbGNcobWrN4k+m5DD718zPgqFHXsxb1fgq4DriTwbuLVzG4pvx14KfA14AHtH0DvL+NbRmwZKifVwKXt8crRjymvRlMoS8Ezm+P524A43oc8KM2rouAt7T2nRi8WF4OfBbYvLVv0dYvb9t3GurrqDbeS4H9Rv08bDXtyz2fwprTY2r1X9AeF4+9Fsz152CrZzdgaXse/geDT1HN+Lj8KhNJUpeN4RKWJGkGGCCSpC4GiCSpiwEiSepigEiSuhggkqQuBogkqcv/B9tT9egetiSGAAAAAElFTkSuQmCC\n"
          },
          "metadata": {
            "needs_background": "light"
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##7.Make Predictions with the model"
      ],
      "metadata": {
        "id": "KHxOy5o9uyTG"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "predctions = pipeline.predict(df,output_level = 'document')"
      ],
      "metadata": {
        "id": "QpmPv2q5uVyd"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "predctions = predctions.dropna(subset = ['classified_sequence'])"
      ],
      "metadata": {
        "id": "rvyeXaHfL9PP"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "predctions"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "j4JpnxjzMOsV",
        "outputId": "3a42de46-4008-49cf-8508-100d35557e7f"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "       Unnamed: 0                                authors  category  \\\n",
              "0              11                                   None     World   \n",
              "1              23  Josh Smith and Christine Kim, Reuters     World   \n",
              "2              24                                   None     World   \n",
              "3              25                       Antonia Blumberg     World   \n",
              "4              26                                   None     World   \n",
              "...           ...                                    ...       ...   \n",
              "17253      200848                       Reuters, Reuters  Sci/Tech   \n",
              "17254      200849                                   None    Sports   \n",
              "17255      200850                                   None    Sports   \n",
              "17256      200851                                   None    Sports   \n",
              "17257      200852                                   None    Sports   \n",
              "\n",
              "      classified_sequence classified_sequence_confidence        date  \\\n",
              "0                 [World]                   [0.99361455]  2018-05-26   \n",
              "1                 [World]                    [0.9782558]  2018-05-25   \n",
              "2                 [World]                   [0.99901104]  2018-05-25   \n",
              "3                 [World]                   [0.60049206]  2018-05-25   \n",
              "4                 [World]                   [0.99780566]  2018-05-25   \n",
              "...                   ...                            ...         ...   \n",
              "17253          [Sci/Tech]                    [0.8938764]  2012-01-28   \n",
              "17254            [Sports]                    [0.9982187]  2012-01-28   \n",
              "17255            [Sports]                   [0.98307234]  2012-01-28   \n",
              "17256            [Sports]                    [0.9991338]  2012-01-28   \n",
              "17257            [Sports]                    [0.9991911]  2012-01-28   \n",
              "\n",
              "                                                document  \\\n",
              "0      South Korean President Meets North Korea's Kim...   \n",
              "1      North Korea Still Open To Talks After Trump Ca...   \n",
              "2      2 Men Detonate Bomb Inside Indian Restaurant N...   \n",
              "3      Thousands Travel Home To Ireland To Vote On Ab...   \n",
              "4      Irish Voters Set To Liberalize Abortion Laws I...   \n",
              "...                                                  ...   \n",
              "17253  RIM CEO Thorsten Heins' 'Significant' Plans Fo...   \n",
              "17254  Maria Sharapova Stunned By Victoria Azarenka I...   \n",
              "17255  Giants Over Patriots, Jets Over Colts Among Mo...   \n",
              "17256  Aldon Smith Arrested: 49ers Linebacker Busted ...   \n",
              "17257  Dwight Howard Rips Teammates After Magic Loss ...   \n",
              "\n",
              "                                                headline  \\\n",
              "0      South Korean President Meets North Korea's Kim...   \n",
              "1      North Korea Still Open To Talks After Trump Ca...   \n",
              "2      2 Men Detonate Bomb Inside Indian Restaurant N...   \n",
              "3      Thousands Travel Home To Ireland To Vote On Ab...   \n",
              "4      Irish Voters Set To Liberalize Abortion Laws I...   \n",
              "...                                                  ...   \n",
              "17253  RIM CEO Thorsten Heins' 'Significant' Plans Fo...   \n",
              "17254  Maria Sharapova Stunned By Victoria Azarenka I...   \n",
              "17255  Giants Over Patriots, Jets Over Colts Among  M...   \n",
              "17256  Aldon Smith Arrested: 49ers Linebacker Busted ...   \n",
              "17257  Dwight Howard Rips Teammates After Magic Loss ...   \n",
              "\n",
              "                                                    link  \\\n",
              "0      https://www.huffingtonpost.com/entry/south-kor...   \n",
              "1      https://www.huffingtonpost.com/entry/north-kor...   \n",
              "2      https://www.huffingtonpost.com/entry/mississau...   \n",
              "3      https://www.huffingtonpost.com/entry/irish-tra...   \n",
              "4      https://www.huffingtonpost.com/entry/ireland-a...   \n",
              "...                                                  ...   \n",
              "17253  https://www.huffingtonpost.com/entry/rim-ceo-t...   \n",
              "17254  https://www.huffingtonpost.com/entry/maria-sha...   \n",
              "17255  https://www.huffingtonpost.com/entry/super-bow...   \n",
              "17256  https://www.huffingtonpost.com/entry/aldon-smi...   \n",
              "17257  https://www.huffingtonpost.com/entry/dwight-ho...   \n",
              "\n",
              "                                       short_description  \\\n",
              "0      The two met to pave the way for a summit betwe...   \n",
              "1      Trump’s announcement came after repeated threa...   \n",
              "2      Fifteen people were taken to the hospital, thr...   \n",
              "3      Just try to read these #HomeToVote tweets with...   \n",
              "4                     Vote counting will begin Saturday.   \n",
              "...                                                  ...   \n",
              "17253  Verizon Wireless and AT&T are already promotin...   \n",
              "17254  Afterward, Azarenka, more effusive with the pr...   \n",
              "17255  Leading up to Super Bowl XLVI, the most talked...   \n",
              "17256  CORRECTION: An earlier version of this story i...   \n",
              "17257  The five-time all-star center tore into his te...   \n",
              "\n",
              "                                                    text  \n",
              "0      South Korean President Meets North Korea's Kim...  \n",
              "1      North Korea Still Open To Talks After Trump Ca...  \n",
              "2      2 Men Detonate Bomb Inside Indian Restaurant N...  \n",
              "3      Thousands Travel Home To Ireland To Vote On Ab...  \n",
              "4      Irish Voters Set To Liberalize Abortion Laws I...  \n",
              "...                                                  ...  \n",
              "17253  RIM CEO Thorsten Heins' 'Significant' Plans Fo...  \n",
              "17254  Maria Sharapova Stunned By Victoria Azarenka I...  \n",
              "17255  Giants Over Patriots, Jets Over Colts Among  M...  \n",
              "17256  Aldon Smith Arrested: 49ers Linebacker Busted ...  \n",
              "17257  Dwight Howard Rips Teammates After Magic Loss ...  \n",
              "\n",
              "[15329 rows x 11 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-7b2d95a2-3861-4627-baf0-6804c7a30ec2\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>authors</th>\n",
              "      <th>category</th>\n",
              "      <th>classified_sequence</th>\n",
              "      <th>classified_sequence_confidence</th>\n",
              "      <th>date</th>\n",
              "      <th>document</th>\n",
              "      <th>headline</th>\n",
              "      <th>link</th>\n",
              "      <th>short_description</th>\n",
              "      <th>text</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>11</td>\n",
              "      <td>None</td>\n",
              "      <td>World</td>\n",
              "      <td>[World]</td>\n",
              "      <td>[0.99361455]</td>\n",
              "      <td>2018-05-26</td>\n",
              "      <td>South Korean President Meets North Korea's Kim...</td>\n",
              "      <td>South Korean President Meets North Korea's Kim...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/south-kor...</td>\n",
              "      <td>The two met to pave the way for a summit betwe...</td>\n",
              "      <td>South Korean President Meets North Korea's Kim...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>23</td>\n",
              "      <td>Josh Smith and Christine Kim, Reuters</td>\n",
              "      <td>World</td>\n",
              "      <td>[World]</td>\n",
              "      <td>[0.9782558]</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>North Korea Still Open To Talks After Trump Ca...</td>\n",
              "      <td>North Korea Still Open To Talks After Trump Ca...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/north-kor...</td>\n",
              "      <td>Trump’s announcement came after repeated threa...</td>\n",
              "      <td>North Korea Still Open To Talks After Trump Ca...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>24</td>\n",
              "      <td>None</td>\n",
              "      <td>World</td>\n",
              "      <td>[World]</td>\n",
              "      <td>[0.99901104]</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>2 Men Detonate Bomb Inside Indian Restaurant N...</td>\n",
              "      <td>2 Men Detonate Bomb Inside Indian Restaurant N...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/mississau...</td>\n",
              "      <td>Fifteen people were taken to the hospital, thr...</td>\n",
              "      <td>2 Men Detonate Bomb Inside Indian Restaurant N...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>25</td>\n",
              "      <td>Antonia Blumberg</td>\n",
              "      <td>World</td>\n",
              "      <td>[World]</td>\n",
              "      <td>[0.60049206]</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>Thousands Travel Home To Ireland To Vote On Ab...</td>\n",
              "      <td>Thousands Travel Home To Ireland To Vote On Ab...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/irish-tra...</td>\n",
              "      <td>Just try to read these #HomeToVote tweets with...</td>\n",
              "      <td>Thousands Travel Home To Ireland To Vote On Ab...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>26</td>\n",
              "      <td>None</td>\n",
              "      <td>World</td>\n",
              "      <td>[World]</td>\n",
              "      <td>[0.99780566]</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>Irish Voters Set To Liberalize Abortion Laws I...</td>\n",
              "      <td>Irish Voters Set To Liberalize Abortion Laws I...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/ireland-a...</td>\n",
              "      <td>Vote counting will begin Saturday.</td>\n",
              "      <td>Irish Voters Set To Liberalize Abortion Laws I...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17253</th>\n",
              "      <td>200848</td>\n",
              "      <td>Reuters, Reuters</td>\n",
              "      <td>Sci/Tech</td>\n",
              "      <td>[Sci/Tech]</td>\n",
              "      <td>[0.8938764]</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>RIM CEO Thorsten Heins' 'Significant' Plans Fo...</td>\n",
              "      <td>RIM CEO Thorsten Heins' 'Significant' Plans Fo...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/rim-ceo-t...</td>\n",
              "      <td>Verizon Wireless and AT&amp;T are already promotin...</td>\n",
              "      <td>RIM CEO Thorsten Heins' 'Significant' Plans Fo...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17254</th>\n",
              "      <td>200849</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>[Sports]</td>\n",
              "      <td>[0.9982187]</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Maria Sharapova Stunned By Victoria Azarenka I...</td>\n",
              "      <td>Maria Sharapova Stunned By Victoria Azarenka I...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/maria-sha...</td>\n",
              "      <td>Afterward, Azarenka, more effusive with the pr...</td>\n",
              "      <td>Maria Sharapova Stunned By Victoria Azarenka I...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17255</th>\n",
              "      <td>200850</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>[Sports]</td>\n",
              "      <td>[0.98307234]</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Giants Over Patriots, Jets Over Colts Among Mo...</td>\n",
              "      <td>Giants Over Patriots, Jets Over Colts Among  M...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/super-bow...</td>\n",
              "      <td>Leading up to Super Bowl XLVI, the most talked...</td>\n",
              "      <td>Giants Over Patriots, Jets Over Colts Among  M...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17256</th>\n",
              "      <td>200851</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>[Sports]</td>\n",
              "      <td>[0.9991338]</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Aldon Smith Arrested: 49ers Linebacker Busted ...</td>\n",
              "      <td>Aldon Smith Arrested: 49ers Linebacker Busted ...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/aldon-smi...</td>\n",
              "      <td>CORRECTION: An earlier version of this story i...</td>\n",
              "      <td>Aldon Smith Arrested: 49ers Linebacker Busted ...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17257</th>\n",
              "      <td>200852</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>[Sports]</td>\n",
              "      <td>[0.9991911]</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Dwight Howard Rips Teammates After Magic Loss ...</td>\n",
              "      <td>Dwight Howard Rips Teammates After Magic Loss ...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/dwight-ho...</td>\n",
              "      <td>The five-time all-star center tore into his te...</td>\n",
              "      <td>Dwight Howard Rips Teammates After Magic Loss ...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>15329 rows × 11 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-7b2d95a2-3861-4627-baf0-6804c7a30ec2')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-7b2d95a2-3861-4627-baf0-6804c7a30ec2 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-7b2d95a2-3861-4627-baf0-6804c7a30ec2');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 151
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##8.Evaluate Predictions "
      ],
      "metadata": {
        "id": "y03ZPigmGPYL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import classification_report\n",
        "print(classification_report(predctions['category'], predctions['classified_sequence'].apply(lambda x: x[0])) )"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yfJtQLGDu8nN",
        "outputId": "cd6dbb06-b707-4d0e-cca6-dd75cc9b37a8"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "              precision    recall  f1-score   support\n",
            "\n",
            "    Business       0.88      0.59      0.71      5077\n",
            "    Sci/Tech       0.63      0.90      0.74      3856\n",
            "      Sports       0.95      0.69      0.80      4221\n",
            "       World       0.56      0.86      0.68      2175\n",
            "\n",
            "    accuracy                           0.73     15329\n",
            "   macro avg       0.75      0.76      0.73     15329\n",
            "weighted avg       0.79      0.73      0.73     15329\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##9.Try a Differnet Model"
      ],
      "metadata": {
        "id": "NeC9CJ3AMeA3"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "pipeline = nlu.load('en.classify.albert.ag_news')\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "v8U7HMWfMEdd",
        "outputId": "132754de-8aaa-4eee-9e37-d5471a4fb955"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "albert_base_sequence_classifier_ag_news download started this may take some time.\n",
            "Approximate size to download 42.8 MB\n",
            "[OK!]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "predctions = pipeline.predict(df,output_level = 'document')"
      ],
      "metadata": {
        "id": "4KqBK0L2NTe3"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "predctions = predctions.dropna(subset = ['classified_sequence'])\n",
        "Predctions\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "1r5NxeJhYL4-",
        "outputId": "24e8f641-9780-4ba6-feba-ad37ba4d611e"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "       Unnamed: 0                                authors  category  \\\n",
              "0              11                                   None     World   \n",
              "1              23  Josh Smith and Christine Kim, Reuters     World   \n",
              "2              24                                   None     World   \n",
              "3              25                       Antonia Blumberg     World   \n",
              "4              26                                   None     World   \n",
              "...           ...                                    ...       ...   \n",
              "17253      200848                       Reuters, Reuters  Sci/Tech   \n",
              "17254      200849                                   None    Sports   \n",
              "17255      200850                                   None    Sports   \n",
              "17256      200851                                   None    Sports   \n",
              "17257      200852                                   None    Sports   \n",
              "\n",
              "      classified_sequence classified_sequence_confidence        date  \\\n",
              "0                   World                       0.994577  2018-05-26   \n",
              "1                   World                       0.988318  2018-05-25   \n",
              "2                   World                        0.99776  2018-05-25   \n",
              "3                   World                       0.942115  2018-05-25   \n",
              "4                   World                       0.999283  2018-05-25   \n",
              "...                   ...                            ...         ...   \n",
              "17253            Sci/Tech                       0.969372  2012-01-28   \n",
              "17254              Sports                        0.99914  2012-01-28   \n",
              "17255              Sports                       0.998942  2012-01-28   \n",
              "17256              Sports                       0.999581  2012-01-28   \n",
              "17257              Sports                       0.998041  2012-01-28   \n",
              "\n",
              "                                                document  \\\n",
              "0      South Korean President Meets North Korea's Kim...   \n",
              "1      North Korea Still Open To Talks After Trump Ca...   \n",
              "2      2 Men Detonate Bomb Inside Indian Restaurant N...   \n",
              "3      Thousands Travel Home To Ireland To Vote On Ab...   \n",
              "4      Irish Voters Set To Liberalize Abortion Laws I...   \n",
              "...                                                  ...   \n",
              "17253  RIM CEO Thorsten Heins' 'Significant' Plans Fo...   \n",
              "17254  Maria Sharapova Stunned By Victoria Azarenka I...   \n",
              "17255  Giants Over Patriots, Jets Over Colts Among Mo...   \n",
              "17256  Aldon Smith Arrested: 49ers Linebacker Busted ...   \n",
              "17257  Dwight Howard Rips Teammates After Magic Loss ...   \n",
              "\n",
              "                                                headline  \\\n",
              "0      South Korean President Meets North Korea's Kim...   \n",
              "1      North Korea Still Open To Talks After Trump Ca...   \n",
              "2      2 Men Detonate Bomb Inside Indian Restaurant N...   \n",
              "3      Thousands Travel Home To Ireland To Vote On Ab...   \n",
              "4      Irish Voters Set To Liberalize Abortion Laws I...   \n",
              "...                                                  ...   \n",
              "17253  RIM CEO Thorsten Heins' 'Significant' Plans Fo...   \n",
              "17254  Maria Sharapova Stunned By Victoria Azarenka I...   \n",
              "17255  Giants Over Patriots, Jets Over Colts Among  M...   \n",
              "17256  Aldon Smith Arrested: 49ers Linebacker Busted ...   \n",
              "17257  Dwight Howard Rips Teammates After Magic Loss ...   \n",
              "\n",
              "                                                    link  \\\n",
              "0      https://www.huffingtonpost.com/entry/south-kor...   \n",
              "1      https://www.huffingtonpost.com/entry/north-kor...   \n",
              "2      https://www.huffingtonpost.com/entry/mississau...   \n",
              "3      https://www.huffingtonpost.com/entry/irish-tra...   \n",
              "4      https://www.huffingtonpost.com/entry/ireland-a...   \n",
              "...                                                  ...   \n",
              "17253  https://www.huffingtonpost.com/entry/rim-ceo-t...   \n",
              "17254  https://www.huffingtonpost.com/entry/maria-sha...   \n",
              "17255  https://www.huffingtonpost.com/entry/super-bow...   \n",
              "17256  https://www.huffingtonpost.com/entry/aldon-smi...   \n",
              "17257  https://www.huffingtonpost.com/entry/dwight-ho...   \n",
              "\n",
              "                                       short_description  \\\n",
              "0      The two met to pave the way for a summit betwe...   \n",
              "1      Trump’s announcement came after repeated threa...   \n",
              "2      Fifteen people were taken to the hospital, thr...   \n",
              "3      Just try to read these #HomeToVote tweets with...   \n",
              "4                     Vote counting will begin Saturday.   \n",
              "...                                                  ...   \n",
              "17253  Verizon Wireless and AT&T are already promotin...   \n",
              "17254  Afterward, Azarenka, more effusive with the pr...   \n",
              "17255  Leading up to Super Bowl XLVI, the most talked...   \n",
              "17256  CORRECTION: An earlier version of this story i...   \n",
              "17257  The five-time all-star center tore into his te...   \n",
              "\n",
              "                                                    text  \n",
              "0      South Korean President Meets North Korea's Kim...  \n",
              "1      North Korea Still Open To Talks After Trump Ca...  \n",
              "2      2 Men Detonate Bomb Inside Indian Restaurant N...  \n",
              "3      Thousands Travel Home To Ireland To Vote On Ab...  \n",
              "4      Irish Voters Set To Liberalize Abortion Laws I...  \n",
              "...                                                  ...  \n",
              "17253  RIM CEO Thorsten Heins' 'Significant' Plans Fo...  \n",
              "17254  Maria Sharapova Stunned By Victoria Azarenka I...  \n",
              "17255  Giants Over Patriots, Jets Over Colts Among  M...  \n",
              "17256  Aldon Smith Arrested: 49ers Linebacker Busted ...  \n",
              "17257  Dwight Howard Rips Teammates After Magic Loss ...  \n",
              "\n",
              "[15329 rows x 11 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-9c3c1704-ce3a-48e3-bc73-692e64d64ba4\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>authors</th>\n",
              "      <th>category</th>\n",
              "      <th>classified_sequence</th>\n",
              "      <th>classified_sequence_confidence</th>\n",
              "      <th>date</th>\n",
              "      <th>document</th>\n",
              "      <th>headline</th>\n",
              "      <th>link</th>\n",
              "      <th>short_description</th>\n",
              "      <th>text</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>11</td>\n",
              "      <td>None</td>\n",
              "      <td>World</td>\n",
              "      <td>World</td>\n",
              "      <td>0.994577</td>\n",
              "      <td>2018-05-26</td>\n",
              "      <td>South Korean President Meets North Korea's Kim...</td>\n",
              "      <td>South Korean President Meets North Korea's Kim...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/south-kor...</td>\n",
              "      <td>The two met to pave the way for a summit betwe...</td>\n",
              "      <td>South Korean President Meets North Korea's Kim...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>23</td>\n",
              "      <td>Josh Smith and Christine Kim, Reuters</td>\n",
              "      <td>World</td>\n",
              "      <td>World</td>\n",
              "      <td>0.988318</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>North Korea Still Open To Talks After Trump Ca...</td>\n",
              "      <td>North Korea Still Open To Talks After Trump Ca...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/north-kor...</td>\n",
              "      <td>Trump’s announcement came after repeated threa...</td>\n",
              "      <td>North Korea Still Open To Talks After Trump Ca...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>24</td>\n",
              "      <td>None</td>\n",
              "      <td>World</td>\n",
              "      <td>World</td>\n",
              "      <td>0.99776</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>2 Men Detonate Bomb Inside Indian Restaurant N...</td>\n",
              "      <td>2 Men Detonate Bomb Inside Indian Restaurant N...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/mississau...</td>\n",
              "      <td>Fifteen people were taken to the hospital, thr...</td>\n",
              "      <td>2 Men Detonate Bomb Inside Indian Restaurant N...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>25</td>\n",
              "      <td>Antonia Blumberg</td>\n",
              "      <td>World</td>\n",
              "      <td>World</td>\n",
              "      <td>0.942115</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>Thousands Travel Home To Ireland To Vote On Ab...</td>\n",
              "      <td>Thousands Travel Home To Ireland To Vote On Ab...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/irish-tra...</td>\n",
              "      <td>Just try to read these #HomeToVote tweets with...</td>\n",
              "      <td>Thousands Travel Home To Ireland To Vote On Ab...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>26</td>\n",
              "      <td>None</td>\n",
              "      <td>World</td>\n",
              "      <td>World</td>\n",
              "      <td>0.999283</td>\n",
              "      <td>2018-05-25</td>\n",
              "      <td>Irish Voters Set To Liberalize Abortion Laws I...</td>\n",
              "      <td>Irish Voters Set To Liberalize Abortion Laws I...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/ireland-a...</td>\n",
              "      <td>Vote counting will begin Saturday.</td>\n",
              "      <td>Irish Voters Set To Liberalize Abortion Laws I...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17253</th>\n",
              "      <td>200848</td>\n",
              "      <td>Reuters, Reuters</td>\n",
              "      <td>Sci/Tech</td>\n",
              "      <td>Sci/Tech</td>\n",
              "      <td>0.969372</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>RIM CEO Thorsten Heins' 'Significant' Plans Fo...</td>\n",
              "      <td>RIM CEO Thorsten Heins' 'Significant' Plans Fo...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/rim-ceo-t...</td>\n",
              "      <td>Verizon Wireless and AT&amp;T are already promotin...</td>\n",
              "      <td>RIM CEO Thorsten Heins' 'Significant' Plans Fo...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17254</th>\n",
              "      <td>200849</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Sports</td>\n",
              "      <td>0.99914</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Maria Sharapova Stunned By Victoria Azarenka I...</td>\n",
              "      <td>Maria Sharapova Stunned By Victoria Azarenka I...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/maria-sha...</td>\n",
              "      <td>Afterward, Azarenka, more effusive with the pr...</td>\n",
              "      <td>Maria Sharapova Stunned By Victoria Azarenka I...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17255</th>\n",
              "      <td>200850</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Sports</td>\n",
              "      <td>0.998942</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Giants Over Patriots, Jets Over Colts Among Mo...</td>\n",
              "      <td>Giants Over Patriots, Jets Over Colts Among  M...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/super-bow...</td>\n",
              "      <td>Leading up to Super Bowl XLVI, the most talked...</td>\n",
              "      <td>Giants Over Patriots, Jets Over Colts Among  M...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17256</th>\n",
              "      <td>200851</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Sports</td>\n",
              "      <td>0.999581</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Aldon Smith Arrested: 49ers Linebacker Busted ...</td>\n",
              "      <td>Aldon Smith Arrested: 49ers Linebacker Busted ...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/aldon-smi...</td>\n",
              "      <td>CORRECTION: An earlier version of this story i...</td>\n",
              "      <td>Aldon Smith Arrested: 49ers Linebacker Busted ...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17257</th>\n",
              "      <td>200852</td>\n",
              "      <td>None</td>\n",
              "      <td>Sports</td>\n",
              "      <td>Sports</td>\n",
              "      <td>0.998041</td>\n",
              "      <td>2012-01-28</td>\n",
              "      <td>Dwight Howard Rips Teammates After Magic Loss ...</td>\n",
              "      <td>Dwight Howard Rips Teammates After Magic Loss ...</td>\n",
              "      <td>https://www.huffingtonpost.com/entry/dwight-ho...</td>\n",
              "      <td>The five-time all-star center tore into his te...</td>\n",
              "      <td>Dwight Howard Rips Teammates After Magic Loss ...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>15329 rows × 11 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9c3c1704-ce3a-48e3-bc73-692e64d64ba4')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-9c3c1704-ce3a-48e3-bc73-692e64d64ba4 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-9c3c1704-ce3a-48e3-bc73-692e64d64ba4');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 164
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "##10.Evaluate Predictions "
      ],
      "metadata": {
        "id": "CfzUZ7UtNd6I"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import classification_report\n",
        "print(classification_report(predctions['category'], predctions['classified_sequence']) )"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "1ff94fe8-7e30-4341-e64b-6e0b4715e93b",
        "id": "Hmtx1UIXNd6J"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "              precision    recall  f1-score   support\n",
            "\n",
            "    Business       0.90      0.64      0.74      5077\n",
            "    Sci/Tech       0.63      0.92      0.75      3856\n",
            "      Sports       0.95      0.81      0.87      4221\n",
            "       World       0.73      0.84      0.78      2175\n",
            "\n",
            "    accuracy                           0.79     15329\n",
            "   macro avg       0.80      0.80      0.79     15329\n",
            "weighted avg       0.82      0.79      0.79     15329\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VXu21c0iQRSC"
      },
      "source": [
        "# There are many more models you can put to use in 1 line of code!\n",
        "## Checkout [the Modelshub](https://nlp.johnsnowlabs.com/models) and the [NLU Namespace](https://nlu.johnsnowlabs.com/docs/en/spellbook) for more models\n",
        "\n",
        "### NLU Webinars and Video Tutorials\n",
        "- [NLU & Streamlit Tutorial](https://vimeo.com/579508034#)\n",
        "- [Crash course of the 50 + Medical Domains and the 200+ Healtchare models in NLU](https://www.youtube.com/watch?v=gGDsZXt1SF8)\n",
        "- [Multi Lingual NLU Webinar - Tutorial on Chinese News dataset](https://www.youtube.com/watch?v=ftAOqJuxnV4)\n",
        "- [John Snow Labs NLU: Become a Data Science Superhero with One Line of Python code](https://events.johnsnowlabs.com/john-snow-labs-nlu-become-a-data-science-superhero-with-one-line-of-python-code?hsCtaTracking=c659363c-2188-4c86-945f-5cfb7b42fcfc%7C8b2b188b-92a3-48ba-ad7e-073b384425b0)\n",
        "- [Python Web Def Conf - Python's NLU library: 1,000+ Models, 200+ Languages, State of the Art Accuracy, 1 Line of Code](https://2021.pythonwebconf.com/presentations/john-snow-labs-nlu-the-simplicity-of-python-the-power-of-spark-nlp)\n",
        "- [NYC/DC NLP Meetup with NLU](https://youtu.be/hJR9m3NYnwk?t=2155)\n",
        "\n",
        "### More ressources \n",
        "- [Join our Slack](https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA)\n",
        "- [NLU Website](https://nlu.johnsnowlabs.com/)\n",
        "- [NLU Github](https://github.com/JohnSnowLabs/nlu)\n",
        "- [Many more NLU example tutorials](https://github.com/JohnSnowLabs/nlu/tree/master/examples)\n",
        "- [Overview of every powerful nlu 1-liner](https://nlu.johnsnowlabs.com/docs/en/examples)\n",
        "- [Checkout the Modelshub for an overview of all models](https://nlp.johnsnowlabs.com/models) \n",
        "- [Checkout the NLU Namespace where you can find every model as a tabel](https://nlu.johnsnowlabs.com/docs/en/spellbook)\n",
        "- [Intro to NLU article](https://medium.com/spark-nlp/1-line-of-code-350-nlp-models-with-john-snow-labs-nlu-in-python-2f1c55bba619)\n",
        "- [Indepth and easy Sentence Similarity Tutorial, with StackOverflow Questions using BERTology embeddings](https://medium.com/spark-nlp/easy-sentence-similarity-with-bert-sentence-embeddings-using-john-snow-labs-nlu-ea078deb6ebf)\n",
        "- [1 line of Python code for BERT, ALBERT, ELMO, ELECTRA, XLNET, GLOVE, Part of Speech with NLU and t-SNE](https://medium.com/spark-nlp/1-line-of-code-for-bert-albert-elmo-electra-xlnet-glove-part-of-speech-with-nlu-and-t-sne-9ebcd5379cd)"
      ]
    }
  ]
}